Part of Speechtagger for Kannada

ثبت نشده
چکیده

Parts of speech tagging is a well-understood problem in NLP. The importance of the problem focuses from the fact that the Parts of Speech tagging is one of the first stages in the process performed by various natural language related process. POS tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. POS tagging has a crucial role in different fields of NLP including MT. In linguistics, parts-of-speech tagging, also termed grammatical tagging or wordcategory disambiguation, is the process of marking up the words in a text or corpus as corresponding to a particular part of speech, based on both its definition, as well as its context. That is, relationship with adjacent and related words in a phrase, sentence, or a paragraph. In other words, it can also be defined as the process of labelling automatic annotation of syntactic categories for each word in a corpus. It is similar to the process of tokenization for computer languages.A part-of-speech is a grammatical category, commonly including verbs, nouns, adjectives, adverbs, determiner, and so on.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Maximum Entropy Approach to Kannada Part Of Speech Tagging

Part Of Speech (POS) tagging is the most important pre-processing step in almost all Natural Language Processing (NLP) applications. It is defined as the process of classifying each word in a text with its appropriate part of speech. In this paper, the probabilistic classifier technique of Maximum Entropy model is experimented for the tagging of Kannada sentences. Kannada language is agglutinat...

متن کامل

Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources

Indian languages are known to have a large speaker base, yet some of these languages have minimal or non-efficient linguistic resources. For example, Kannada is relatively resource-poor compared to Malayalam, Tamil and Telugu, which in-turn are relatively poor compared to Hindi. Many Indian language pairs exhibit high similarities in morphology and syntactic behaviour e.g. Kannada is highly sim...

متن کامل

Language Identification of Kannada Language using N-Gram

Language identification is an important pre-processing step for any Natural Language Processing task. Kannada Language is an Indian Language and lot of research is being carried out on Kannada Language Processing. Major parts of online documents like websites are combination of Kannada and English Sentences. Language Identification is a preprocessing step for NLP tasks like POS tagging, Sentenc...

متن کامل

Morpheme Segmentation for Kannada Standing on the Shoulder of Giants

This paper studies the applicability of a set of state-of-the-art unsupervised morphological segmentation algorithms for the problem of morpheme boundary detection in Kannada, a resource-poor language with highly inflectional and agglutinative morphology. The choice of the algorithms for the experiment is based in part on their performance with highly inflected languages such as Finnish and Ben...

متن کامل

OCR for printed Kannada text to Machine editable format using Database approach

This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013